Introduction

For my final independent project for BIOL 551/L: Computer Modeling, I decided to use Tidy Tuesday Animal Crossing New Horizons user and critic review data. With this data I will be doing asking three questions:
1. What words are commonly used in user and critic reviews?
2. What is the sentimental value of words for user and critic reviews?
3. How did bombing user reviews affect overall grade?
w/ Special Guest Tom Nook

Loading in the libraries

library(tidyverse)
library(here)
library(lubridate)
library(tidytext)
library(flair)
library(wordcloud2)
library(DescTools)
library(webshot)
library("htmlwidgets")

Reading and saving all the data

critic <- readr::read_tsv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/critic.tsv')
#write.csv(critic, 'criticdfACNH.csv')

user_reviews <- readr::read_tsv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/user_reviews.tsv')
#write.csv(user_reviews,'userreviewsACNH.csv')

Text Analysis: What words are commonly used in User and Critic Reviews for Animal Crossing: New Horizons?

Common User Words:

When running this chunk, it yields a tibble table of user reviews where the words are descending based on the average grade (1-10). For some odd reason the word “bombing” was associated with high average reviews. I looked further into it because there are no bombs in the game, so why are people saying it in the reviews?

#wrangle data for a word cloud
user_review_words<- user_reviews %>% # calling in data set and creating a new one for the wrangling i will be doing
  unnest_tokens(word, text) %>% #splitting text column into tokens aka words
  anti_join(stop_words, by = "word") %>% #not counting stop words from the english words
  count(user_name, date, grade, word) # count (n) based on usernames words

user_review_words %>% # call in data set
  group_by(word) %>% # group by word
  summarize(avg_grade = mean(grade), #summarize based on average grade and the amount of reviews each word is in 
            nb_reviews = n()) %>% 
  filter(nb_reviews >=50) %>% #filter the number of reviews to >= 50
  arrange(desc(avg_grade)) #arrange in descending order based on avg_grade
## # A tibble: 296 x 3
##    word        avg_grade nb_reviews
##    <chr>           <dbl>      <int>
##  1 bombing          8.64         80
##  2 fantastic        8.46         84
##  3 charming         8.29         52
##  4 complaining      8.09         56
##  5 series           7.83        223
##  6 amazing          7.58        212
##  7 perfect          7.56        134
##  8 juego            7.54         85
##  9 es               7.49         65
## 10 relaxing         7.47        168
## # ... with 286 more rows

There was a bombing?

Yes you heard it here folks, there was indeed a review bombing during the first few weeks of the games release. With this chunk I will pipe the dataset to show me the reviews that contain the word “bombing”

bombing_reviews<- user_reviews %>% #calling in orginal data and saving a new one for just the bombing reviews
  filter(str_detect(text,"bombing")) #detect anywhere in the data set where it mentions "bombing"
  • User E_Z_E states, “Ignore the review bombing haters.”
  • User Sorites states, “People who review bombing because they can’t have separate islands per switch are just being childish.”
  • User Earthquakemass states, “I dont see why people are coming in a review bombing it with 0’s metacritic should do something about this because its really becoming a problem.”

TO BE CONTINUED IN DATA ANALYSIS :)…………………………….

Making a Wordcloud: wHaT dId ThE uSeRs SaY?

When making the WordCloud, I noted that there were a couple of words that kept repeating such as the title of the game Animal Crossing: New Horizons so I decided to leave them out of the wordcloud by adding a filter of less than or equal to 730 user reviews.

userwords<-user_review_words %>% #making the data set for the wordcloud
  count(word) %>% #count the number of words
  arrange(desc(n)) %>% #arrange the words from descending order based on (n)
  filter(n <=720) %>% #added filter for only showing less than 730 reviews
  slice(1:100)

wordcloud2(userwords, shape = 'star', size = 0.25, minSize = 5, rotateRatio = 0, color = rep_len(isabelle.colors, nrow(userwords)),  backgroundColor = isabelle.background, shuffle = FALSE) #creating word cloud with customizations

Common Critic Words:

Similar to the User Reviews for ACNH, Critic reviews also had a couple of words that came up consistently such as the title of the game, and the word “game”. I filtered the word critic_review_words data so that the number of reviews must be less than or equal to 25.

#wrangle critic data 
critic_review_words<- critic %>%#calling in critic data and naming it to something specific for wrangling
  unnest_tokens(word, text) %>% #unnest words from text column and create seperate column for each word 
  anti_join(stop_words, by = "word") %>% #do not add stop words to the word column 
  count(publication, date, grade, word) #count based on publications each with their words
######## separate pipe so that it is possible to view when knit ###########
critic_review_words %>% #calling in dataframe just wrangled
  group_by(word) %>% #group by word
  summarize(avg_grade = mean(grade), #summarize function for average grade and the number of reviews by n
            nb_reviews = n()) %>% 
  filter(nb_reviews <=25) %>% #filter so that the number of reveiws is less than 25
  arrange(desc(nb_reviews)) #arrange table by descending nb_reviews
## # A tibble: 1,055 x 3
##    word       avg_grade nb_reviews
##    <chr>          <dbl>      <int>
##  1 nintendo        92.1         21
##  2 time            92.1         21
##  3 experience      91.5         17
##  4 life            93.6         17
##  5 switch          91.5         17
##  6 world           94.1         15
##  7 perfect         93.5         13
##  8 franchise       89.6         12
##  9 fun             87.8         12
## 10 games           92.7         12
## # ... with 1,045 more rows

On the bright side, there is no “bombing” here so the data analysis won’t look too wonky when we look at the reviews holistically.

Making a WordCloud: WhAt DiD tHe CrItIcS sAy?

criticloud<- critic_review_words %>% #creating data set for wordcloud
  count(word) %>% #count (n) word
  filter(n <=25) %>% #filter so that n is less than or equal to 25
  arrange(desc(n)) %>% # arrange so that it is in descending order of (n)
  slice(1:100)

critc.cloud<-wordcloud2(criticloud, shape = 'star', size = .3, minSize = 5, rotateRatio=0, shuffle = FALSE, color = rep_len(nook.colors, nrow(criticloud)), backgroundColor = nook.background) # this doesnt show when knit but does show in output folder and as runned chunk
saveWidget(critc.cloud, "crit.html", selfcontained = F)
webshot("crit.html", "crit_1.png", delay = 5, vwidth = 480, vheight = 480)

After creating both word clouds, I am still having trouble trying to determine what exactly is the difference between the user and critic clouds. The clouds look relatively similar; both have some matching words such as, “fun, experience, crafting, and time.” So I beg the question . . .

Sentiment Text Analysis: What is the sentimental value of words for user and critic reviews?

For the sentiment analysis, I will be using the already wrangled data previously used for the word clouds, user_review_words and critic_review_words, to see if there is any positive or negative connotation to their words in their reviews.

User Sentiment Analysis

I wrangled the user_review_words data set for sentiment analysis by using the inner join function to get the sentiment value (positive or negative). Finally I used the count function to count the amount of times the words are being said and to sort by their sentiment.

user_sent_word_counts <- user_review_words %>% #calling in user review words and wrangling more for the sentiment analysis, called it something new
  inner_join(get_sentiments()) %>% #join in new column and get sentiments for each word +/-
  count(word, sentiment, sort = TRUE) #count based on word, sentiment, and then sort the words

Creating a Sentimental Plot :(: Users

To represent the sentiment analysis, I used the new the data set I just edited and decided to filter the amount of times a word is being used to greater than 80 because there are a lot of repeated words. I then mutated the n column so that if the sentiment is negative then the value of n will be negative too. Finally created a ggplot with columns and customized it.

user_sent_word_counts %>% #calling in new dataset
  filter(n>80) %>%  #filter the amount of time the word appears, greater than 80
  mutate(n = ifelse(sentiment == "negative",-n,n)) %>% #if the word is negative, make the number negative
  mutate(word = reorder(word, n)) %>% # reorder based on (n)
  #slice(1:100) %>% 
  ggplot(aes(word, n, fill = sentiment))+ #creating ggplot
  geom_col(color = "black")+ #columns
  coord_flip()+ #flip the x and y coords
  labs(x = "Word",y = "Contribution to Sentiment", #customizing graph
       title = "wHaT dId ThE uSeRs MeAn?", caption = "By: Kevin Candray")+ #labels
  scale_fill_manual(values = c("#bd8549", "#ffcf4f"))+
  theme_bw()+
  theme(axis.text = element_text(size=12),
        axis.title = element_text(size=16),
        axis.text.y = element_text(size = 10, color = "#000000"),
        axis.text.x = element_text(size = 9, color ="#000000" ),
        panel.grid.major = element_blank(),
        plot.caption = element_text(hjust = 0.5),
        plot.title = element_text(hjust = 0.5),
        panel.grid.minor = element_blank(),
        panel.background = element_rect(fill ="#f7f4d5" ),
        plot.background = element_rect(fill = "#eed3a6"),
        legend.background = element_rect(fill = "#eed3a6"))

Mocking Intensifies

In this graph, it is clear that the users were easily split between the game, but why? Many users are saying the game is “fun” and “[enjoyable]” yet so many other users are saying it’s “bad” and “limited”; what is it? The positive sentiments do outweigh the negative sentiments but not by much could the bombing reviews have anything to do with this?

Critic Sentiment Analysis

I ran the same code for the critic and user sentiment analysis, I had to switch some parts of the code around such as the dataframe and filters.

critic_sent_word_counts <- critic_review_words %>% #calling in previous critic data set and name it ... word_counts
  inner_join(get_sentiments()) %>%  #join the words based on sentiment
  count(word, sentiment, sort = TRUE) #count the number of word, sentiment and sort by positive and negative

Creating a sentimental plot :(: Critics

In this filter, I chose n to be greater than 2 because there were not that many reviews like the users.It seemed like each review was very unique because not many had repeating words like the user reviews.

critic_sent_word_counts %>% #calling in new data set just made
  filter(n>2) %>% #filter for (n) is greater than 2 because a majority of them are 2 or less
  mutate(n = ifelse(sentiment == "negative",-n,n)) %>% #mutate the value of n is the sentiment is negative to a negative n value
  mutate(word = reorder(word, n)) %>% # change word column so that is reordered based on (n)
  ggplot(aes(word, n, fill = sentiment))+ #start gg plot with x and y, fill with sentiment 
  geom_col(color = "black")+ # column graph and making lines black 
  coord_flip()+ #flip x and y axis
  labs(x = "Word",y = "Contribution to Sentiment", #customizing graph
       title = "wHaT dId ThE cRiTiCs MeAn?", caption = "By: Kevin Candray")+ #labels
  scale_fill_manual(values = c("#74bfd4", "#03a279"))+
  theme_bw()+
  theme(axis.text = element_text(size=12),
        axis.title = element_text(size=16),
        axis.text.y = element_text(size = 10, color = "#000000"),
        axis.text.x = element_text(size = 9, color ="#000000" ),
        panel.grid.major = element_blank(),
        plot.caption = element_text(hjust = 0.5),
        plot.title = element_text(hjust = 0.5),
        panel.grid.minor = element_blank(),
        panel.background = element_rect(fill ="#f7f4d5" ),
        plot.background = element_rect(fill = "#fef9f8"),
        legend.background = element_rect(fill = "#fef9f8"))

Positivity

In the critic data set, we are not dealing with as many reviews as the users, however these publishers have built a reputation to give their best review possible. When the critics reviewed the Animal Crossing: New Horizons, there was clearly a positive attitude of the game from the critics. Many thought it was “perfect” and “fun” game while only a few had negative sentiments about the it.

Data Analysis: How did the bombing User reviews of Animal Crossing: New Horizons affect it’s overall grade?

User Grade Review

To clarify what “bombing reviews” meant, I used the user_reviews dataframe and piped it into a histogram ggplot with the grade from the review on the x axis and the number of reviews on the y axis.

user_reviews %>% #calling in original user review data
  ggplot(aes(grade))+ #plotting the grade of ACNH review on x axis
  geom_histogram(fill = "#7fa9ff", color= "#000000")+ #histogram with customs
      labs(x = "Grade from Review (1-10)",y = "Number of Reviews",
       title = "User Review Grades in 2020 for AC:NH ", caption = "By: Kevin Candray")+ #labels
  theme_bw()+
  #scale_x_discrete()+
  theme(axis.text = element_text(size=12),
        axis.title = element_text(size=16),
        axis.text.y = element_text(size = 10, color = "#000000"),
        axis.text.x = element_text(size = 9, color ="#000000" ),
        panel.grid.major = element_blank(),
        plot.caption = element_text(hjust = 0.5),
        plot.title = element_text(hjust = 0.5),
        panel.grid.minor = element_blank(),
        panel.background = element_rect(fill ="#f7b4d2" ),
        plot.background = element_rect(fill = "#ff9592"),
        legend.background = element_rect(fill = "#ff9592"), 
        legend.key = element_rect(fill = "#ff9592"),
        legend.title = element_text(size = 10, color = "#000000"))

BOOM

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   2.000   4.217  10.000  10.000

Because of the overwhelming amount of users that gave AC:NH a 0 out of 10 the stats say that the average grade for this video game is 4.2 out of 10.

User Review Timeline

For this chunk, I created a timeline of when the reviews where posted along with their average grades to see when the significant amount of 0’s first appeared.

user_reviews %>% 
  group_by(week = floor_date(date, "week")) %>% 
  summarize(nb_reviews = n(),
            avg_grade = mean(grade)) %>%  # couldnt do month because it camew out mid march
  ggplot(aes(week, avg_grade))+
  geom_line(color = "#7fa9ff", size = 1.2)+
  geom_point(aes(size=nb_reviews),
             color = "#0123ac")+
  expand_limits(y=0)+
    labs(x = "Date(2020)",y = "Grade from AC:NH Review (1-10)",
       title = "User Average Review Grades in 2020 for AC:NH ", caption = "By: Kevin Candray")+ #labels
  theme_bw()+
  theme(axis.text = element_text(size=12),
        axis.title = element_text(size=16),
        axis.text.y = element_text(size = 10, color = "#000000"),
        axis.text.x = element_text(size = 9, color ="#000000" ),
        panel.grid.major = element_blank(),
        plot.caption = element_text(hjust = 0.5),
        plot.title = element_text(hjust = 0.5),
        panel.grid.minor = element_blank(),
        panel.background = element_rect(fill ="#f7b4d2" ),
        plot.background = element_rect(fill = "#ff9592"),
        legend.background = element_rect(fill = "#ff9592"), 
        legend.key = element_rect(fill = "#ff9592"),
        legend.title = element_text(size = 10, color = "#000000"))

Right as the game released, many fans supported the game with good reviews, however by the end of March there was a significant amount of reviews that graded it 0. People were extremely disappointed with the game because the game felt unfinished to the fans.

As a player of this game, I understand why there is a dip in the grade after late April. During the whole month of April, players are presented with a special event in the game called “bunny day” and it ruins the progress of the game because the event lasted about the entire month. People were not having it near the end of April, or so it shows.

Critc Grade Review

Similar to the User Grade Review, I did the same for the critic reviews.

critic %>% 
  ggplot(aes(grade))+
  geom_histogram(color = "black", fill ="#047f7d" )+
        labs(x = "Grade from AC:NH Review",y = "Number of Reviews",
       title = "Critic Review Grade in 2020 for AC:NH ", caption = "By: Kevin Candray")+ #labels
  theme_bw()+
  theme(axis.text = element_text(size=12),
        axis.title = element_text(size=16),
        axis.text.y = element_text(size = 10, color = "#000000"),
        axis.text.x = element_text(size = 9, color ="#000000" ),
        panel.grid.major = element_blank(),
        plot.caption = element_text(hjust = 0.5),
        plot.title = element_text(hjust = 0.5),
        panel.grid.minor = element_blank(),
        panel.background = element_rect(fill ="#fffcc6" ),
        plot.background = element_rect(fill = "#f4f433"),
        legend.background = element_rect(fill = "#f4f433"), 
        legend.key = element_rect(fill = "#f4f433"),
        legend.title = element_text(size = 10, color = "#000000"))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   70.00   90.00   90.00   90.64   94.00  100.00

The summary of the grades from the reviews yields that the critics, on average, gave it a score of 90 out of 100. Because publications didn’t experience a review bombing the results were skewed right.

Critic Review Timeline

I used the same code from the User review to reprsent the Critic reviews similarly.

critic %>% 
  group_by(week = floor_date(date, "week")) %>% 
  summarize(nb_reviews = n(),
            avg_grade = mean(grade)) %>%  # couldnt do month because it came out mid march
  ggplot(aes(week, avg_grade))+
  geom_line(color = "#047f7d", size = 1.2)+
  geom_point(aes(size=nb_reviews),
             color = "#64ba60")+
  expand_limits(y=0)+
    labs(x = "Date (2020)",y = "Average Grade Out of 100", #customs for graph
       title = "User Average Review Ratings in 2020 for AC:NH ", caption = "By: Kevin Candray")+ #labels
  theme_bw()+
  theme(axis.text = element_text(size=12),
        axis.title = element_text(size=16),
        axis.text.y = element_text(size = 10, color = "#000000"),
        axis.text.x = element_text(size = 9, color ="#000000" ),
        panel.grid.major = element_blank(),
        plot.caption = element_text(hjust = 0.5),
        plot.title = element_text(hjust = 0.5),
        panel.grid.minor = element_blank(),
        panel.background = element_rect(fill ="#fffcc6" ),
        plot.background = element_rect(fill = "#f4f433"),
        legend.background = element_rect(fill = "#f4f433"), 
        legend.key = element_rect(fill = "#f4f433"),
        legend.title = element_text(size = 10, color = "#000000"))

One could say that there was a small review bombing at the very beginning, when the game released, however they were not all zero’s with negative sentiments. The trend continued to thrive for critic reviews bu having positive words to say followed by a high grade.


THANK YOU!